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We analyze, both analytically and numerically, the self-organization of a system of "selfish" adap- 
tive agents playing an arbitrary iterated pairwise game ( defined by a 2 x 2 payoff matrix) . Examples 
of possible games to play are: the Prisoner's Dilemma (PD) game, the chicken game, the hero game, 
etc. The agents have no memory, use strategies not based on direct reciprocity nor 'tags' and are 
chosen at random i.e. geographical vicinity is neglected. They can play two possible strategies: 
cooperate (C) or defect (D). The players measure their success by comparing their utilities with an 
estimate for the expected benefits and update their strategy following a simple rule. 

Two versions of the model are studied: 1) the deterministic version (the agents are either in 
definite states C or D) and 2) the stochastic version (the agents have a probability c of playing C). 

Using a general Master Equation we compute the equilibrium states into which the system self- 
organizes, characterized by their average probability of cooperation Ceq. Depending on the payoff 
matrix, we show that c^q can take five different values. 

We also consider the mixing of agents using two different payoff matrices an show that any value 
of Ceq can be reached by tunning the proportions of agents using each payoff matrix. In particular, 
this can be used as a way to simulate the effect a fraction d of "antisocial" individuals -incapable 
of realizing any value to cooperation- on the cooperative regime hold by a population of neutral or 
"normal" agents. 



PACS numbers: 89.75.-k, 87.23. Ge, 89.65.Gh, 89.75.Fb 



I. INTRODUCTION 

Complex systems pervade our daily life. They are difficult to study because they don't exhibit simple 
cause-and-efFect relationships and their interconnections are not easy to disentangle. 

Game Theory has demonstrated to be a very flexible tool to study complex systems. It coalesced 
in its normal forrri^ during the second World War with the work of Von Neumann and Morgenstern 
who first applied it in Economics. 

Later, in the seventies, it was the turn of Biology mainly with the work of J. Maynard-Smith 01, 
who shown that the Game Theory can be applied to various problems of evolution, and proposed the 
concept of Evolutionary Stable Strategy (ESS), as an important concept for understanding biological 
phenomena. Following rules dictated by game theory to attain an ESS requires neither consciousness 
nor a brain. Moreover, a recent experiment found that two variants of a RNA virus seem to engage in 
two-player games 0] . 

This opens a new perspective, perhaps the dynamic of very simple agents, of the kind we know 
in Physics, can be modeled by Game Theory providing an alternative approach to physical problems. 
For instance, energies could be represented as payoffs and phenomena like phase transitions understood 
as many-agents games. As a particular application of this line of thought we have seen recently a 
proliferation of papers addressing the issue of quantum games j^- which might shed light on the hot 
issue of quantum computing. Conversely, Physics can be useful to understand the behavior of adaptive 
agents playing games used to model several complex systems in nature. For instance, in some interesting 
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works Szabo et al |9|,[l0j applied the sophisticated techniques developed in non-equilibrium statistical 
physics to spatial evolutionary games. 

The most popular exponent of Game Th eory is the Prisoner's Dilemma (PD) game introduced in 
the early fifties by M. Flood and M. Dresher [ij to model the social behavior of "selfish" individuals - 
individuals which pursue exclusively their own self-benefit. 

The PD game is an example of a 2 x 2 game in normal form: i) there are 2 players, each confronting 
2 choices - to cooperate (C) or to defect (D)-, ii) with a 2 x 2 matrix specifying the payoffs of each player 
for the 4 possible outcomes: [C,C],[C,D],[D,C] and [D,D] 20] and iii) each player makes his choice without 
knowing what the other will do. A player who plays C gets the "reward" R or the "sucker's payoff" S 
depending if the other player plays C or D respectively, while if he plays D he gets the "temptation to 
defect" T or the "punishment" P depending if the other player plays C or D respectively. These four 
payoffs obey the relations: 

T> R> P> S, (1) 

and 



2R> S + T. (2) 

Thus independently of what the other player does, by (^1, defection D yields a higher payoff than coop- 
eration C {T > R and P > S) and is the dominant strategy. The outcome [D,D] is thus called a Nash 
equilibrium 12J. The dilemma is that if both defect, both do worse than if both had cooperated {P < R). 
Condition is required in order that the average utilities for each agent of a cooperative pair (R) are 
greater than the average utilities for a pair exploitative-exploiter ((T -I- S)/2). 

Changing the rank order of the payoffs - the inequalities gives rise to different games. A general 
taxonomy of 2 x 2 games (one- shot games involving two players with two actions each) was constructed 
by Rapoport and Guyer . A general 2x2 game is defined by a payoff matrix M^'^-^-^ with payoffs 
not necessarily obeying the conditions ^ or (j^plf 

^/tRSTP _ { {R,R) {S,T)\ 

-[iT,S) {P,P))- ^•^> 

The payoff matrix gives the payoffs for row actions when confronting with column actions. 

Apart from the PD game there are other some well studied games. For instance, when the damage 
from mutual defection in the PD is increased so that it finally exceeds the damage suffered by being 
exploited: 

T> R> S>P, (4) 

the new game is called the chicken game. Chicken is named after the car racing game. Two cars 
drive towards each other for an apparent head-on collision. Each player can swerve to avoid the crash 
(cooperate) or keep going (defect). This game applies thus to situations such that mutual defection is 
the worst possible outcome (hence an unstable equilibrium). 

When the reward of mutual cooperation in the chicken game is decreased so that it finally drops 
below the losses from being exploited: 

T> S> R> P, (5) 

it transforms into the leader game. The name of the game stems from the following every day life situation: 
Two car drivers want to enter a crowded one-way road from opposite sides, if a small gap occurs in the 
line of the passing cars, it is preferable that one of them take the lead and enter into the gap instead of 
that both wait until a large gap occurs and allows both to enter simultaneously. 
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In fact, every payoff matrix, which at a first glance could seem unreasonable from the point of view 
of selfish individuals, can be applicable to describe real life situations in different realms or contexts. 
Furthermore, "unreasonable" payoff matrices can be used by minorities of individuals which depart from 
the "normal" ones (assumed to be neutral) for instance, absolutely D individuals incapable of realizing 
any value to cooperation or absolutely C "altruistic" individuals (more on this later). 

In one-shot or non repeated games, where each player has a dominant strategy, as in the PD, then 
generally these strategies will be chosen. The situation becomes more interesting when the games are 
played repeatedly. In these iterated games players can modify their behavior with time in order to 
maximize their utilities as they play i.e. they can adopt different strategies. In order to escape from 
the non-cooperative Nash equilibrium state of social dilemmas it is generally assumed either memory of 
previous interactions 0| or features ("tags") permitting cooperators and defectors to distinguish one 
another 0; or spatial structure is required [iq . 

Recently, it was proposed [T^ a simple model of selfish agents without memory of past encounters, 
without tags and with no spatial structure playing an arbitrary 2x2 game, defined by a general payoff 
matrix like Q. At a given time t, each of the Nag agents, numbered by an index i, has a probability 
Ci{t) of playing C (1 — Ci{t) of playing D). Then a pair of agents are selected at random to play. All the 
players use the same measure of success to evaluate if they did well or badly in the game which is based 
on a comparison of their utilities U with an estimate of the expected income e and the arithmetic mean 
of payoffs /i = (R + S + T + P)/^. Next, they update their Ciit) in consonance, i.e. a player keeps his 
Ci {t) if he did well or modifies it if he did badly. 

Our long term goal is to study the quantum and statistical versions of this model. That is, on one 
hand to compare the efficiency and properties of quantum strategies vs. the classical ones for this model 
in a spirit similar to that of ref. j^. On the other hand, we are also interested in the effect of noise, for 
instance by introducing a Metropolis Monte-Carlo temperature, and the existence of power laws in the 
space of payoffs that parameterize the game, of the type found in ref. and fo^' ^ spatial structured 
version of this model. Before embarking on the quantum or statistical mechanics of this model, the 
objective in this paper is to complete the study of the simplest non-spatial M-F version. In particular, to 
present an analytic derivation of the equilibrium states for any payoff matrix i.e. for an arbitrary 2x2 
game using elemental calculus, both for the deterministic and stochastic versions. In the first case the 
calculation is elementary and serves as a guide to the more subtle computation of the stochastic model. 
These equilibrium states into which the systems self-organizes, which depend on the payoff matrix, are 
of three types: "universal cooperation" or "all C", of intermediate level of cooperation and "universal 
defection" or "all D" with, respectively, Ceq — 1.0, < Cg, < 1.0 and 0.0. We also consider the effect of 
mixing players using two different payoff matrices. Specifically, a payoff matrix producing Ceq=0.0 and 
the canonical payoff matrix are used to simulate, respectively, absolutely D or "antisocial" agents and 
"normal" agents. 



II. THE MODEL 

We consider two versions of the model introduced in ref. First, a deterministic version, in 

which the agents are always in definite states either C or D i.e. "black and white" agents without 
"gray tones". Nevertheless, it is often remarked that this is clearly an over-simplification of the behavior 
of individuals. Indeed, their levels of cooperation exhibit a continuous gamma of values. Furthermore, 
completely deterministic algorithms fail to incorporate the stochastic component of human behavior. 
Thus, we consider also a stochastic version, in which the agents only have probabilities for playing C. In 
other words, the variable denoting the state or "behavior" of the agents, for the deterministic case 
takes only two values: Ci — 1 (C) or (D) while for the stochastic case q is a real variable S [0, 1]. 

The pairs of players are chosen randomly instead of being restricted to some neighborhood. The 
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implicit assumptions behind this are that the population is sufficiently large and the system connectivity is 
high. In other words, the agents display high mobility or they can experiment interactions at a distance 
(for example electronic transactions, etc.). This implies that Nag the number of agents needs to be 
reasonably large. For instance, in the simulations presented in this work the population of agents will be 
fixed to Nag = 1000. 

The update rule for the of the agents is based on comparison of their utilities with an estimate. 
The simplest estimate that agent number k for his expected utilities in the game is provided by the 
utilities he would made by playing with himself |22j . that is: 

ef^^^(<) = {R-S-T + P)ck{tf + {S + T-2P)ck{t)+P, (6) 

where Cfc is the probability that in the game the agent k plays C. From equation (|SJ) we see that the 
estimate for C-agents {ck = 1) ec and D-agents (cfc = 0) e_D are given by 

ec - i?, eD = P (7) 

The measure of success we consider here is slightly different from the one considered in ref. jT^ : 
To measure his success each player compares his profit Uk{t) with the maximum between his estimate 
ek{t), given by lO, and the arithmetic mean of the four payoffs given by /i = (i? + 5 + T + P)/4 ^j. If 
jjRSTP^^^ > (<) maxje^'^'^-^, /i} the player assumes he is doing well (badly) and he keeps (changes) his 
Cfc(t) as follows: if player k did well he assumes his Ck{t) is adequate and he keeps it. On the other hand, 
if he did badly he assumes his Ck is inadequate and he changes it (from C to D or from D to C in the 
deterministic version). 

We are interested in measuring the average probability of cooperation c vs. time, and in particular 
in its value of equilibrium c^q, after a transient which is equivalent to the final fraction of C-agents fc- 

III. COMPUTATION OF THE EQUILIBRIUM STATES 

A. Deterministic version 

For the deterministic case the values of Ceq are obtained by elementary calculus as follows. Once 
equilibrium has been reached, the transitions from D to C, on average, must equal those from C to D. 
Thus, the average probability of cooperation c^q is obtained by equalizing the flux from C to D, Jcd, to 
the flux from D to C, Jdc- The players who play C either they get R (in [C,C] encounters) or S (in [C,D] 
encounters), and their estimate is ec = R] thus, according to the update rule, they change to D if i? < /i 
or 5' < max{i?,/i} respectively. For a given average probability of cooperation c, [C,C] encounters occur 
with probability and [C,D] encounters with probability c(l — c). Consequently, Jcd can be written as: 

JcD oi acc(? + acDc{l ~ c), (8) 

with 

acc — ^'(m ~ R) s-nd acD — ^(max{i?, /i} — 5), (9) 
where 9{x) is the step function given by: 

^(^)= ^f x<0 (10) 
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On the other hand, the players who play D either they get T (in [D,C] encounters) or P (in [D,D] 
encounters) and their estimate is e_D = P; thus, according to the update rule, they change to C if 
T < max{^, P} or P < n respectively. As [D,C] encounters occur with probability (1 — c)c and [D,D] 
encounters with probability (1 — c)^, Jcd can be written as: 

J_DC oc a_Dc(l - c)c + a£)D(l - c)^, (11) 

with 

aDD = 0{fi~P) and aoc = 0{miix{P, n} - T). (12) 

In equilibrium 

JcoiCeq) = Jocic-eq), (13) 

and thus we get a set of second order algebraic equations for Ceq'- 

{acc - acD + aoc - aDD)clq + {acD - aoc + 2aDD)ceq - aoD = 0. (14) 

As there are 2 possibilities for each coefficient axY, we have a total of 2^ — 16 different equations 
governing all the possible equilibrium states (actually there are 15 since this includes the trivial equation 
= 0). The roots^ of these equations are: 





3-\/5 

1/2 (15) 

\/5-l 
2 

1 

In addition, we have to take into account the case when: 

O'CC ~ O.DD = , . 

a-cD = a-DC = 1- 

In this case we can see from ^ and (lllll that Jcd = Jdc identically, so we have that p^q = Co, (being 
Co the initial mean probability), whatever the initial conditions are. 

For instance, for the canonical payoff matrix we have acc = = a_Dc ^nd acD = 1 = idd, therefore 
we get 

Ceqi^ - Ceq) ^ {I - Ceqf , (17) 

with the root c^q = 1/2 corresponding to the stable dynamic equilibrium in which the agents change their 
state in such a way that, on average, half of the transitions are from C to D and the other half from D 
to C. 
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B. Stochastic version 



In the case of a continuous probability of cooperation Ck , the calculation is a little bit more subtle: 
now the estimate for the agent k is not only R or P, as it happened in the discrete case, but it can 
take a continuum of values as the probability Ck varies in the interval [0,1]. From now on we will use the 
estimate as given in ^ , but instead of function of time we will use a generic e that is a function 

of the cooperation probability (and implicitly of time, off course), that is: 

gflSTP^^^ ^ (i? _ 5 _ T + P)c{if + {S + T- 2P)c{t) + P. (18) 

So we have: 

.f^^^(t) = e«^^^(c,(t)). (19) 

To calculate c^q we begin by writing a balance equation for the probability Ci(t). The agents will follow 
the same rule as before: they will keep their state if they are doing well (in the sense explained earlier) 
and otherwise they will change it. If two agents i and j play at time t, with probabilities Ci{t) and Cj{t) 
respectively, then the change in the probability a, provided he knows Cj{t), would be given by: 

c,(t + 1) - c,{t) = -c,{t)cj{t) [1 - e{R - e"^^^P{c,{t)) 0{R ~ ^)] 

^c,(t)[i - c,it)] [1 - eis - 6^^^^(c.(t)) eis ~ /.)] 
+[1 - Q(t)]c,(t) [1 - eiT - e«^^^(c,(t)) eiT ~ ^,)] 

+ [1 - c4mi c,{t)] [1 - 9{P e"^^^P{cm 0{P ~ A^)], 

being 6 the step function. The equation of evolution for Cj{t) is obtained by simply exchanging i < — j 
in equation (|2()(l . Certainly, the assumption that each agent knows the probability of cooperation of 
his opponent is not realistic. Later, when we perform the simulations, we will introduce a procedure to 
estimate the opponent's probability (more on this in Section V.b) 

In (|20|) if at time t the payoff obtained by agent i, X {— R, S, T or P) is less than 
max{e^^'^ ^ {ci{t)) , fj,} , the first two terms in the RHS decrease the cooperation probability of 
agent i, while the two last terms increase it. The terms give no contribution if the payoff X is greater or 
equal than max{e^"^-^^(ci(t)), ^}. 

We will use the canonical payoff matrix M^'~'^^ to illustrate how the above equation of evolution for 
Ci{t) works. In this case, the estimate function is, by IjlSI) : 



thus it is easy to see that: 





£3051 (c) = 


-c2^ 


-3c + 


1 , 




-e305i(c)) 


= 1 


Vc 


e [0,1] 


6(0 


-e305i(c)) 


= 


Vc 


e [0, 1] 




-e305i(c)) 


1 


Vc 


G [0,1] 


e{i 


-e305i(c)) 


= 


Vc 


e (0,1] 



(21) 



(22) 
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In addition we have for this case n — 2, 25, thus: 

61(3 -m) ^ 1 

'^"^ -''^ = ' (23) 

0(5 - m) = 1 ^ ' 

0(1- fi) = 0. 

We can then write, to a very good approximation (we are assuming that the last hue of (|22|l is vahd for 
c = also): 

c,(t+l)-c,(i) =-c,(t)[l-Cj(t)] + [l-c,(0][l-c,(t)] 

= [l-c,(0][l-2c.(t)]. ^^^^ ^'^^ 

Defining the mean probability of cooperation as 



]^E^- (25) 



summing eq. (|24|l over i and j leads to: 



c(t+l)-c(i) = [l-c(i)][l-2c(i)] 
= l-3c(t) + c(t)2, 



(26) 



within an error of 0{1/Nag) since (|24f) is valid Viy^j but we are summing over all the Nag agents. 
Thereof we can calculate the equilibrium mean probability of cooperation Ceq'- 

= 1 - 3c{t) + c{tf , (27) 

obtaining the two roots: 

Ceq = -Yj2 (28) 

being c^q = 1/2 the stable solution. Hence we obtain the same result that in the deterministic case 
Using analog reasoning for the general case, we can conclude that if 

Xi\y.,eTr\ (29) 

or 

> ^TaY (30) 

the results for the mean cooperation probability for the deterministic version and the stochastic version, 
are the same. 

There is an easy way to evaluate fma^^ in practice. It can be seen -see appendix- that 

if 

5 -f T > 2 max{i?, P} ^ e^^^ ^P-\ W^M-) (^1) 
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while, if 



S + T <2 max{R, P} e: 



nsTP 



maxji?, P} . 



(32) 



max 



When there is a payoff X such that 



RSTP 



(33) 



max 



things can change because agents who get X update in general their probability of cooperation Ci{t) 
differently depending whether X < e{ci) or X > e(ci). So as the probability takes different values in the 
interval [0, 1], we have different equations of evolution, which somehow "compete" against each other in 
order to reach the equilibrium. The different equations that can appear are off course restricted to the 
ones generated by the coefficients axY as they appear in H14I) . ft is reasonable to expect then that the 
final equilibrium value for the mean probability will be somewhere in between the original equilibrium 
values for the equations competing. We will analyze some particular cases of this type in Section V.b 
to illustrate this point. 

Although at first sight one may think that the universe of possibilities fulfilling condition H33|) is 
very vast, it happens that no more than three different balance equations can coexist. This can be seen 
as follows: from eqs. ifCT and e^ax^ — maxji?, F}, and besides we know that the estimate never 
could be greater than all the payoffs, so there is at least one X such that emax^ < X. So this leaves 
us with only two payoffs that effectively can be between ^ and e^^^^ > ^^'^ this results in at most three 
balance equations playing in a given game. 

IV. AN EXAMPLE OF COEXISTENCE OF AGENTS USING DIFFERENT PAYOFF 
MATRICES: COOPERATION IN PRESENCE OF "ALWAYS D" AGENTS 

Let us analyze now the situation where there are a mixing of agents using two different payoff 
matrices, each leading by separate to a different value of Ceq. For simplicity we consider the deterministic 
version but the results for the stochastic version are similar. We call "antisocial" individuals those for 
whom cooperation never pays and thus, although they can initially be in the C state, after playing they 
turn to state D and remain forever in this state. They can be represented by players using a payoff matrix 
that always update q to 0; for instance M^"^"^. Notice that these individuals are basically different from 
those which use a payoff matrix fulfilling conditions (Q) and who, even though they realize the value 
of cooperation i.e. R> P and 2R > T + S, often may be tempted to "free ride" in order to get a higher 
payoff. However, with the proposed mechanism -which implies a sort of indirect reciprocity- when D 
grows above 50 % it punishes, on average, this behavior more than C favoring thus a net flux from D 
to C. Conversely, if C grows above 50 % it punish, on average, this behavior more than D favoring thus 
the opposite flux from C to D. In other words, small oscillations around fc = 0.5 occur. On the other 
hand, agents using are "immune" to the former regulating mechanism. Let us analyze the effect 

they have on cooperation when they "contaminate" a population of neutral agents (using the canonical 
payoff matrix). In short, the two types of individuals play different games (specified by different payoff 
matrices) without knowing this fact, a situation which does not seem too far from real life. 

The asymptotic average probabilities of cooperation can be obtained by simple algebra combining 
the update rules for M^°^^ and M-'^^^'^. The computation is completely analogous the one which leads to 
()17|l . We have to calculate Jdc and Jcd as a function of the variable c and the parameter d and by 
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equalizing them at equilibrium we get the equation for c^g. To J^ic only contribute the fraction (l-d) of 
normal players using the canonical payoff matrix who play D against a player who also plays D (normal 
or antisocial). That is, Jdc is given by 



Jdc « (l-d)(l-c)2 



(34) 



On the other hand, contributions to Jcd come from one of these 3 types of encounters: i) [C,D] no 
matter if agents are neutral or antisocial, ii) [C,C] of two antisocial agents and iii) [C,C] of a neutral and 
antisocial agent (the neutral agent remains C and the antisocial, who started at t = playing C and has 
not played yet, changes from C to D). The respective weights of these 3 contributions are: c(l — c), cPc^ 
and ^2d{l — Therefore, Jcd is given by 



Jcd oc c(1 - c) + (fc^ + d{l - d)c^ = c(l - c) + dc^ 
In equilibrium Jdc = Jcd and the following equation for c^q arises: 



{l-d){2c%-2Ceq + l)+C,q=Q, 



and solving it: 



3 - 2d ± V^452 + 4dTl 



(35) 



(36) 



(37) 



We must take the roots with the "-" sign because those with "+" are greater than 1 for non null values 
of d. We thus get the following table for Ceq for different values of the parameter d: 





(d= 


=0.0) 


= 0.5000 


^eq 


(d= 


=0.1) 


= 0.4538 


^eq 


(d= 


=0.2) 


= 0.4123 


^eq 


(d= 


=0.3) 


= 0.3727 


^eq 


(d- 


=0.4) 


= 0.3333 


^eq 


(d= 


=0.5) 


= 0.2929 


Ceq 


(d= 


=0.6) 


= 0.2500 


Ceq 


(d- 


=0.7) 


= 0.2029 


Ceq 


(d= 


=0.8) 


= 0.1492 


Ceq 


(d= 


=0.9) 


= 0.0845 


Ceq 


(d= 


=1.0) 


= 0.0 



Table 1. Ceq for agents using M'^°^^ contaminated by a fraction d of antisocial agents using M^"^"^. 



V. SIMULATIONS 



A. Deterministic version 



In this subsection we present some results produced by simulation for the deterministic version. 
Different payoff matrices were simulated and it was found that the system self-organizes, after a transient, 
in equilibrium states in total agreement with those calculated in H15|l. 
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The update from Ci{t) to Ci{t + 1) was dictated by a balance equation of the kind of H20() . The 
measures are performed over 1000 simulations each and c^q denotes the average of Ceq over these milliard 
of experiments. In order to show the independence from the initial distribution of probabilities of cooper- 
ation, Fig. 1 shows the evolution with time of the average probability of cooperation for different initial 
proportions of C-agents fco for the case of the canonical payoff matrix M^°^^ (i.e. R = 3, S = 0,T = 5 
and P = 1). 




1000 2000 3000 4000 5000 

time 



FIG. 1: c vs. time, for different initial values of fco, for the canonical payoff matrix. 



Depending on the payoff matrix the equilibrium asymptotic states can be of three types: of "all C" 
{ceq = 1.0), "all D" {ceq — 0.0) Or Something in between (0 < Ceq < 1). 

We have seen that the canonical payoff matrix M'^°'^^ provides an example of matrix which gives 

Ceq = 0.5. 

Let us see examples of payoff matrices which produce other values of Ceq. A payoff matrix which 
produces Ceq = 1.0 is obtained simply by permuting the canonical values of S (0) and T (5), i.e. M^^'^^. 
For this matrix we have, by inspection of Q and (|12|l : 

acc = acD = aoc = a.DD = 1- (38) 

Hence, after playing the PD game the pair of agents always ends [C,C] since Jcd = by ©. 

On the other hand, a payoff matrix which leads Ceq = 0.0 is obtained simply by permuting the 
canonical values of R (3) and P (1), i.e. M^'^^^, for which: 

acc = a.cD = 1 aoc = o-dd = 0. (39) 

That is, all the changes are from C to D since in this case Jdc = 

The rate of convergence to the possible values of Ceq depends on the values of Jcd and Jdc- 

Fig. 2 shows the approach of the average probability of cooperation for different payoff matrices to 

their final 5 equilibrium values. 
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FIG. 2: Curves of c vs. time for different payoff matrices producing the 5 possible values of Ceq (from below to 
above): payoff matrices M^'^"^ with = 1, M^"^^ with Ceq ~ 0.62, M^°^^ with Ce, = 0.5, M^^"^ with Ceq ~ 0.38 
and M^"^^ with Ce, = 0. 



Finally, we simulated the mixing of agents using payoff matrices M^°^^ and M^"^'^. The evolution to 
equilibrium states for different fixed fractions d of agents using M^°^^ is presented in Fig. 3. The results 
are in complete agreement with the asymptotic probabilities of cooperation which appear in Table 1. 



B. Stochastic version 



In this case simulations were made updating the probability of cooperation according to eq. 1)20(1 . 
However, as we anticipated, we have to change slightly this eq. to reflect reality: two agents i and j 
interact and they obtain the payoffs Xi and Xj, respectively. For each of them there is no way, from 
this only event, to know the probability of cooperation Cfc of his opponent. What they can do then is to 
(roughly) estimate this Ck as follows. The player i average utility in an encounter at time t with agent j 
is given by: 

U,j{t) = Rc,{t)cj{t) + Sc,{t)[\ - c,] +T[1 - c,{t)]c,{t)+P[\ - c,{t)][l - c,{t)] . (40) 

When he plays he gets the payoff Xi, so his best estimate c* for the probability of agent j is obtained by 
replacing Uij{t) for Xi in eq. (|40|l . Then he will have: 

? it) = X.-P + c^mP-S) 

' c^{t){R-S-T~P) + T-P ^ ' 

Exchanging i for j in this eq. gives the estimate of the probability Ci{t) that makes agent j. Equation (|41|l 
can retrieve any value of c](i) and not just in the interval [0, 1], so it is necessary to make the following 
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FIG. 3: The evolution of c with time, for different values of the fraction d "antisocial" agents (using M'^"^'') 
embedded in a population of neutral agents (using the canonical payoff matrix). 



replacements: 

if i-(t)>l =^ c^(t) = l 
and if c}(t)<0 => 5;(t) = 

When this happens, the agent is making the roughest approximation, which is to assume that the other 
player acts like in the deterministic case. 

For the canonical payoff matrix, the result was the expected one as this is a matrix obeying condition 
(|29|l : as predicted by the analytical calculation of Section Ill.b, the value for the equilibrium mean 
probability is Cgg = 1/2 as in the deterministic case, despite the change introduced in (|41|l . Simulations 
for other payoff matrices satisfying conditions H29|l or 1)30(1 were also made and in all the cases the 
deterministic results were recovered. 

We will illustrate the case in which some 

(43) 

with two particular examples. One of them is the case of the normalized matrix M^^^°, with S varying 
from 1 to 2, both limiting cases in which condition H43|l ceases to be valid. So for S* < 1 the update 
equation is given simply by: 

c,{t + 1) - c^{t) = [1 - c^{t)][l - c,{t)] (44) 
being Ceq — 1 in this case, while for S* > 2: 



Ci{t + 1) - c^{t) = 1 - c,(i) - C,{t)Cj{t) 



(45) 
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for which c^q = ^ ^ ^^^^ corresponding equihbrium value. When S G (1,2], both balance equations 
play a role, the general equation for the update follows from eq. H2UI) applied to this particular case: 

c,(i + 1) - c,{t) = [1 - c,(t)][l - c,{t)] + [c,[t) - 2c,{t)c,{tm - e{l - e)]. (46) 

So we can see that for i? = T = 1 > e, eq. reduces to (gH) while ifi? = T= l<ewe obtain (03)). 
When the simulation takes place, Cj has to be replaced by cj . 

The same analysis can be done for the matrices M^^^°, with T varying from 1 to 2 also. In this 
case the other root competing with Ceq = 1 is Cgq = '^^^ ■ 

The results of the simulations for both cases are presented in the next table, data for 5* > 2 and 
T > 2 -for which is valid- is also included |2^ : 



X 


Ceq for 

X^S 


Ceq for 

X = T 


1 


1 


1 


1.5 


1 


1 


1.9 


1 


1 


2 


1 


1 


2.1 


0.617 


0.383 


4 


0.581 


0.370 


8 


0.556 


0.403 


16 


0.530 


0.467 


1000 


0.548 


0.455 



As it can be seen from the data, for 1 < X < 2, that is, when condition H43|l is valid, the results for 
the stochastic case are the same that they would be if we were working with the deterministic model. 
This is a consequence of the estimate (|41() together with conditions (|42(l . 

For values of T and S greater than 2, for which condition H43|) does not hold any more, we can observe 
what at first may seem a curiosity: for T or S near 2, the equilibrium values for the deterministic case 
are recovered as expected, but as we increase the values of T or S, the value of Ceq = 1/2 is approached. 
After a little thought, it is clear that this is also a consecuence of the estimation of (|41|l . since it depends 
on the payoffs. It can be easily seen that in the case of M^^™: 

if T>1 then 5}~0 Vi,j {forX.^T). (47) 

If we take then Cj = in eq. (|20() . and remembering that T — > oo implies that /i — ^ oo, we will obtain 
that Ceq = 1/2. In an analogous way for M-^^^^: 

if S'>1 then 5}~1 Vi,i (for X, 7^ 5') (48) 

which toghether with eq. H20I) again leads to Ceq — 1/2. The encounters for which Xi — S 01 T are 
responsible for that the exact value Ceq = 1/2 is not attained. A similar analysis can be done when R or 
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VI. SUMMARY AND OUTLOOK 



The proposed strategy, the combination of measure of success and update rule, produces cooperation 
for a wide variety of payoff matrices. 
In particular, notice that: 

• A cooperative regime arises for payoff matrices representing "Social Dilemmas " like the canonical 
one. On the other hand spatial game algorithms like the one of ref. (IS] produce cooperative states 
(ceg > 0) in general for the case of a "weak dilemma" in which P = 5 = or at most when P is 
significantly below R [2^ . 

• Payoff matrices with R = S = which, at least in principle, one would bet that favor D, actually 
generate equilibrium states with Cpq ^ 0, provided that P < -see eqs. (jSll- (|13|) . 

• Any value of equilibrium average cooperation can be reached in principle, even in the case of the 
deterministic model, by the appropriate mixing of agents using 2 different payoff matrices. This is 
an interesting result that goes beyond the different existent social settings. For instance we have in 
mind situations in which one wants to design a device or mechanism with a given value of c^q that 
optimizes its performance. 

• In this work we adopted a Mean Field approach in which all the spatial correlations between agents 
were neglected. One virtue of this simplification is that it shows the model does not require that 
agents interact only with those within some geographical proximity in order to sustain cooperation. 
Playing with fixed neighbors is sometimes considered as an important ingredient to successfully 
maintain the cooperative regime |16 | .|19 | . (Additionally, the equilibrium points can be obtained by 
simple algebra.) 

To conclude we mention different extensions and applications of this model as possible future work. 
We mentioned, at the beginning, "statistical mechanic" studies. For instance, by dividing the four payoffs 
between say the reward R reduces the parameters to three: a = S/R, b ^ T/R and d = P/R, and we 
are interested to analyze the dependence of Ceq on each one of these 3 parameters in the vicinity of a 
transition between two different values. It is also interesting to introduce noise in the system, by means 
of an inverse temperature parameter f3, in order to allow irrational choices. The player i changes his 
strategy with a probability Wi given by 



' 1 + cxp[/3(C/, - e~ )] ' 

where = max{ei,/i}. 

We are planning also a quantum extension of the model in order to deal with players which use 
superposition of strategies ac\C > +a£)\D > instead of definite strategies. 

The study of the spatial structured version and how the different agents lump together is also an 
interesting problem to consider. Results on that topic will be presented elsewhere. 

Finally, a test for the model against experimental data seems interesting. In the case of humans 
the experiments suggest, for a given class of games {i.e. a definite rank in the order of the payoffs), a 
dependency of fc with the relative weights of R, S, T and P, which is not observed in the present model. 
Therefore, we should change the update rule in such a way to capture this Feature. Work is also in 
progress in that direction. 



15 



APPENDIX: Calculus for the meiximum of the Gain Estimate function in the stochastic 
case. 

We will now show in detail the calculus for the maximum of the gain estimate function e^^^^{c), 
restricted to the interval [0, 1]. First we have to know if the function has a maximum in the open interval 
(0, 1). This can be done by noticing that, by H18|l . for having negative concavity, we have the condition: 

i?~S'-T + P<0 (49) 

By doing 

ac 

we find that the extremum of t^^^^ (c) is attained at 

1 {S + T-2P) 
°~ 2{R-S-T + P) ' 

Imposing Co > 0, Co < 1 and using (|49|1 for consistency, we obtain: 

S + T>2P , S + T>2R (52) 

Notice that the sum of this two conditions is equivalent to condition 149|) . hi turn, 1)52(1 can be expressed 
as 

S' + T > 2max{i?,P} (53) 
so this inequality resumes (|49|l and H52|l . It can be seen that if (ESI is fulfilled, e^l^^ > ^ always. 

So if condition 1(53(1 holds, the maximum of the function e^^^^{c) takes place in the interval (0, 1) 
and its value as a function of the parameters R, S, T and P is: 

,-STP^p_l {S + T-2PY 

tmao. ^{^R_S-T + P) ^ ' 



On the other hand, if 



then 



5 + T < 2 max{i?, P} (55) 



RSTP 
max 



iZZ^Z.' =max{P, P} (56) 



since e«i'2'^(0) = P , e«^^^(l) = i? 
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